fix(scraper): replace misleading 403 hint for AI Scraper Studio errors by anil-bd · Pull Request #5 · brightdata/cli

anil-bd · 2026-05-18T15:47:58Z

When a bdata scraper create succeeds on the template POST but the subsequent AI-trigger POST 429s (e.g. because the user hit the AI Flow parallel-job cap), the half-built collector_id is still printed. If that id is then passed to bdata scraper run, the API returns 403 + body {"error":"Collector does not have a template"}.

Today the CLI maps any 403 to a fixed hint:

Hint: Access denied. Check your zone permissions in the control
      panel.

This sends the user 30+ minutes down a zone-permission rabbit hole that has nothing to do with the actual problem (the AI Flow never finished generating selectors for this collector). Observed multiple times during stress testing.

This change is structured so the AI Scraper Studio error vocabulary stays in the scraper command and does NOT leak into the shared HTTP client. scrape, search, discover, pipelines, and browser are unaffected.

Mechanism:

src/utils/client.ts gains a generic hints?: Body_hint[] field on Request_opts. The pure helper pick_hint(status, body, hints) consults the caller's list first and falls back to the existing ERROR_HINTS status-code map. The shared client ships ZERO command-specific patterns.
src/commands/scraper.ts defines SCRAPER_BODY_HINTS two patterns:
- /collector does not have a template/i → AI generation didn't complete; re-run scraper create; web-UI URL for manual recovery.
- /cannot run more than \d+ jobs in parallel/i → AI-Flow concurrent-job cap; serialise launches. Every post/get call in handle_create_scraper, handle_run_scraper, and run_batch passes hints: SCRAPER_BODY_HINTS so a 4xx from any of them is translated with the right vocabulary.
Real zone-permission 403s (any body that doesn't match the scraper patterns) still get the original "Access denied" hint — test 'does not consult ERROR_HINTS when an extra-hint pattern matches' locks this in.

Tests: 8 unit tests for client.pick_hint using mock generic patterns (covers mechanism + asserts the shared client carries no scraper vocabulary in ERROR_HINTS), plus 5 scraper command tests asserting the scraper patterns are well-formed and travel via hints to client.post on every AI-Flow call. Two existing tests relaxed from strict opts-object matches to objectContaining-style. 58 / 58 tests in the affected files pass. The 9 pre-existing failures in unrelated suites (daemon, add-mcp, browser, discover, scrape) on main are unchanged by this PR.

When a `bdata scraper create` succeeds on the template POST but the subsequent AI-trigger POST 429s (e.g. because the user hit the AI Flow parallel-job cap), the half-built `collector_id` is still printed. If that id is then passed to `bdata scraper run`, the API returns 403 + body `{"error":"Collector does not have a template"}`. Today the CLI maps any 403 to a fixed hint: Hint: Access denied. Check your zone permissions in the control panel. This sends the user 30+ minutes down a zone-permission rabbit hole that has nothing to do with the actual problem (the AI Flow never finished generating selectors for this collector). Observed multiple times during stress testing. This change is structured so the AI Scraper Studio error vocabulary stays in the scraper command and does NOT leak into the shared HTTP client. `scrape`, `search`, `discover`, `pipelines`, and `browser` are unaffected. Mechanism: * `src/utils/client.ts` gains a generic `hints?: Body_hint[]` field on `Request_opts`. The pure helper `pick_hint(status, body, hints)` consults the caller's list first and falls back to the existing `ERROR_HINTS` status-code map. The shared client ships ZERO command-specific patterns. * `src/commands/scraper.ts` defines `SCRAPER_BODY_HINTS` — two patterns: - /collector does not have a template/i → AI generation didn't complete; re-run `scraper create`; web-UI URL for manual recovery. - /cannot run more than \d+ jobs in parallel/i → AI-Flow concurrent-job cap; serialise launches. Every `post`/`get` call in `handle_create_scraper`, `handle_run_scraper`, and `run_batch` passes `hints: SCRAPER_BODY_HINTS` so a 4xx from any of them is translated with the right vocabulary. * Real zone-permission 403s (any body that doesn't match the scraper patterns) still get the original "Access denied" hint — test 'does not consult ERROR_HINTS when an extra-hint pattern matches' locks this in. Tests: 8 unit tests for `client.pick_hint` using mock generic patterns (covers mechanism + asserts the shared client carries no scraper vocabulary in ERROR_HINTS), plus 5 scraper command tests asserting the scraper patterns are well-formed and travel via `hints` to client.post on every AI-Flow call. Two existing tests relaxed from strict opts-object matches to objectContaining-style. 58 / 58 tests in the affected files pass. The 9 pre-existing failures in unrelated suites (daemon, add-mcp, browser, discover, scrape) on main are unchanged by this PR.

anil-bd force-pushed the fix/stub-collector-hint branch from a6957fa to 5157b51 Compare May 18, 2026 19:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(scraper): replace misleading 403 hint for AI Scraper Studio errors#5

fix(scraper): replace misleading 403 hint for AI Scraper Studio errors#5
anil-bd wants to merge 1 commit into
brightdata:mainfrom
anil-bd:fix/stub-collector-hint

anil-bd commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anil-bd commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant